Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add simple check for variables with different capitalization #84

Merged
merged 14 commits into from
Mar 5, 2024

Conversation

fbenke-pik
Copy link
Contributor

@fbenke-pik fbenke-pik commented Feb 1, 2024

Implements a possible solution to address #82

  • Checks for variables with differing capitalization and throws a warning
  • The check increases runtime of codeCheck by approximately 70 seconds.

Copy link

codecov bot commented Feb 1, 2024

Codecov Report

Attention: Patch coverage is 60.31746% with 25 lines in your changes are missing coverage. Please review.

Project coverage is 41.65%. Comparing base (a246bdd) to head (ffc90cc).

Files Patch % Lines
R/checkAppearance.R 59.32% 24 Missing ⚠️
R/codeCheck.R 75.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master      #84      +/-   ##
==========================================
+ Coverage   41.41%   41.65%   +0.23%     
==========================================
  Files          51       51              
  Lines        1671     1707      +36     
==========================================
+ Hits          692      711      +19     
- Misses        979      996      +17     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@fbenke-pik
Copy link
Contributor Author

fbenke-pik commented Feb 2, 2024

Managed to bring down the runtime to approximately 75 seconds by avoiding to grep the entire codebase twice. Instead, I do it only once with ignore.case =T and then apply a second search using ignore.case = F on the intermediate search instead of doing it on the whole codebase again, as the latter is a subset of the former.

For future reference, the new check could also be implemented using stringr::str_extract_all instead of grep, but it seems to take much longer and miss some valid results, so going with grep for now

code <- paste0(code, collapse = " ")

...

duplicates <- sapply(declarationsRegex, function(x) {
  chunks <- str_extract_all(code, stringr::regex(x, ignore_case = T))[[1]]
  return(length(unique(chunks)) != length(unique(tolower(chunks))))
})

@fbenke-pik fbenke-pik marked this pull request as ready for review February 6, 2024 09:21
@fbenke-pik
Copy link
Contributor Author

fbenke-pik commented Feb 23, 2024

@tscheypidi I can add a list of lines per found name, but it seems a bit tricky to print a meaningful warning, as the number of affected lines can be quite long and the warnings will be truncated:

Warning message:
Found variables with more than one capitalization in the codebase: iso, land, type, age, k
 Details for item 'land':
 vm_lu_transitions(j,land_from,land_to) Land transitions between time steps (mio. ha)
 q10_transition_matrix(j)             Land transition constraint cell area (mio. ha)
 q10_transition_to(j,land_to)       Land transition constraint to (mio. ha)
 q10_transition_from(j,land_from)   Land transition constraint from (mio. ha)
 q10_landexpansion(j,land_to)       Land expansion constraint (mio. ha)
 q10_landreduction(j,land_from)     Land reduction constraint (mio. ha)
 q10_landdiff                         Land difference constraint (mio. ha)
 ov_lu_transitions(t,j,land_from,land_to,type) Land transitions between time steps (mio. ha)
 oq10_transition_matrix(t,j,type)              Land transition constraint cell area (mio. ha)
 oq10_transition_to(t,j,land_to,type)          Land transition constraint to (mio. ha)
 oq1 [... truncated] 

How about introducing a helper that accepts a variable name and returns the respective lines and is run outside of the check instead?

Or better use an ignore list after all?

@tscheypidi
Copy link
Member

I think this list of lines can be useful even if partly truncated as it gives you an indication what kind of instances were found.

@fbenke-pik
Copy link
Contributor Author

fbenke-pik commented Feb 26, 2024

Ok, I have added the detailed message. Are we good to go then?

Warning message:
Found variables with more than one capitalization in the codebase: iso, land, type, age, k
- Lines found for item 'iso':
$setglobal c37_labor_metric  ISO
    / ISO, HOTHAPS /
- Lines found for item 'land':
 vm_lu_transitions(j,land_from,land_to) Land transitions between time steps (mio. ha)
 q10_transition_matrix(j)             Land transition constraint cell area (mio. ha)
 q10_transition_to(j,land_to)       Land transition constraint to (mio. ha)
 q10_transition_from(j,land_from)   Land transition constraint from (mio. ha)
 q10_landexpansion(j,land_to)       Land expansion constraint (mio. ha)
 q10_landreduction(j,land_from)     Land reduction constraint (mio. ha)
 q10_landdiff                         Land difference constraint (mio. ha)
 ov_lu_transitions(t,j,land_from,land_to,type) Land transitions between time steps (mio. ha)
 oq10_transition_matrix(t,j,type)              Land transition constraint cell area (mio. ha)
 oq10_transition_to(t,j,land_to,type)          Land transition  [... truncated] 
 ``

@fbenke-pik
Copy link
Contributor Author

I added support for an exclusion list. See also magpiemodel/magpie#641

Copy link
Member

@tscheypidi tscheypidi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks a lot! Just two minor requests concerning the configuration file

R/codeCheck.R Outdated
ap <- checkAppearance(gams)
capitalExclusionList <- NULL
if (file.exists(file.path(path, ".codeCheck.yaml"))) {
capitalExclusionList <- read_yaml(file.path(path, ".codeCheck.yaml"))[["capitalExclusionList"]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 Suggestions:

  • can we use the name .codeCheck instead of .codeCheck.yaml? This seems to me more consistent to all the other config files like .lintr, .Rprofile, .buildLibrary and so on.
  • Can you add some information about .codeCheck in the documentation of codeCheck so that users are aware of it and know which settings are available?
Suggested change
capitalExclusionList <- read_yaml(file.path(path, ".codeCheck.yaml"))[["capitalExclusionList"]]
capitalExclusionList <- read_yaml(file.path(path, ".codeCheck"))[["capitalExclusionList"]]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added both.

Copy link
Member

@tscheypidi tscheypidi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great, thanks again!

@fbenke-pik fbenke-pik merged commit 6c17ae3 into pik-piam:master Mar 5, 2024
4 checks passed
@fbenke-pik fbenke-pik deleted the variableCheck branch March 5, 2024 10:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants